System and method for decoding traffic over proxy servers

ABSTRACT

Methods and systems for applying surveillance to client computers that communicate via proxy servers. A decoding system accepts communication packets from a communication network. Based on the received packets, the decoding system identifies that a certain client computer conducts a communication session with a target server via a proxy server. The decoding system processes the packets so as to correlate the identity of the client computer with the identity of the target server. The correlated identities may comprise, for example, Internet Protocol (IP) addresses or Uniform Resource Locators (URLs).

RELATED APPLICATIONS

This application is a continuation of U.S. patent application entitled“System and Method for Decoding Traffic Over Proxy Servers,” Ser. No.15/602,477, filed on May 23, 2017, which is a continuation of U.S.patent application Ser. No. 13/358,476, filed on Jan. 25, 2012, which inturn claims priority to IL 210,899 filed on Jan. 27, 2011, thedisclosures of which are incorporated by reference herein.

FIELD OF THE DISCLOSURE

The present disclosure relates generally to computer networks, andparticularly to methods and systems for decoding network trafficexchanged over proxy servers.

BACKGROUND OF THE DISCLOSURE

Client computers in computer networks sometimes access servers viaintermediary proxy servers. Examples of proxy servers are Hyper-TextTransfer Protocol (HTTP) proxy servers, also referred to as Web proxyservers, and SOCKS proxy servers. Proxy servers may be employed for avariety of reasons, such as for hiding (“anonymizing”) the identity ofthe client, for communicating behind firewalls, or for improving networkperformance.

SUMMARY OF THE DISCLOSURE

An embodiment that is described herein provides a method includingreceiving communication packets from a communication network andidentifying, based on the received communication packets, that a clientcomputer conducts a communication session with a target server via aproxy server. A first identity of the client computer is correlated witha second identity of the target server by processing the communicationpackets.

In some embodiments, an indication of the second identity is encoded inone or more of the communication packets that are exchanged between theclient computer and the proxy server, and correlating the first identitywith the second identity includes decoding the indication. In anembodiment, the client computer communicates with the proxy server overa first Transmission Control Protocol (TCP) tunnel, the proxy servercommunicates with the target server over a second TCP tunnel, andcorrelating the first identity with the second identity includesdecoding at least one of the first and second TCP tunnels.

In a disclosed embodiment, the proxy server includes a Hyper-TextTransfer Protocol (HTTP) proxy server. In another embodiment, the proxyserver operates in accordance with a SOCKS protocol. In yet anotherembodiment, the method includes reconstructing and presenting thecommunication session, as viewed at the client computer, using thecorrelated first and second identities. Reconstructing and presentingthe communication session may include modifying at least some of thereceived communication packets to imitate a modified session between theclient computer and the target server that does not traverse the proxyserver, reconstructing the modified session from the modifiedcommunication packets and presenting the modified communication session.

There is additionally provided, in accordance with an embodiment that isdescribed herein, apparatus including a network interface and aprocessor. The network interface is configured to receive communicationpackets from a communication network. The processor is configured toidentify, based on the received communication packets, that a clientcomputer conducts a communication session with a target server via aproxy server, and to correlate a first identity of the client computerwith a second identity of the target server by processing thecommunication packets.

The present disclosure will be more fully understood from the followingdetailed description of the embodiments thereof, taken together with thedrawings in which:

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram that schematically illustrates a web decodingsystem, in accordance with an embodiment of the present disclosure; and

FIG. 2 is a flow chart that schematically illustrates a method fordecoding network traffic that is exchanged over a proxy server, inaccordance with an embodiment of the present disclosure.

DETAILED DESCRIPTION OF EMBODIMENTS Overview

Client computers in computer networks sometimes access servers (e.g.,Web-sites) via proxy servers. In some cases, a proxy server may be usedinnocently, e.g., for improving network performance. In other cases, aclient computer may use a proxy server for illegitimate purposes, e.g.,in order to conceal his or her identity from the server or to accessservers that are blocked for access. In either case, communication viaproxy servers makes it difficult to correlate the identity of the clientcomputer with the identity of the server, and therefore presents achallenge for interception and surveillance applications.

Embodiments that are described herein provide methods and systems forapplying surveillance to client computers that communicate via proxyservers. In the context of the present patent application and in theclaims, the term “proxy server” refers to any type of server via which aclient computer communicates with a target server and whichde-correlates the identity of the client computer from the identity ofthe target server.

Typically, de-correlation means that a packet that is sent or receivedby the proxy server may contain the identity of the client computer orthe identity of the target server, but not both, at least not inexplicit form. When communicating via a SOCKS proxy, for example, thetrue destination address is sent in the packet payload, and is not theIP destination address that appears in the packet header. In addition tovarious types of proxy servers such as Web-proxy servers and SOCKS proxyservers, the disclosed techniques can be used with other types ofintermediary servers such as compression servers.

In some embodiments, a decoding system accepts communication packetsfrom a communication network. Based on the received packets, thedecoding system identifies that a certain client computer conducts acommunication session with a target server via a proxy server. Thedecoding system processes the packets so as to correlate the identity ofthe client computer with the identity of the target server. Thecorrelated identities may comprise, for example, Internet Protocol (IP)addresses or Uniform Resource Locators (URLs).

The decoding system may use various methods for correlating the identityof the client computer with that of the target server, even though bothidentities are not explicitly given in the same packet. For example, insome communication protocols the identity of the target server isinserted, in encoded or obfuscated form, into request packets that aresent from the client computer to the proxy server. In some embodiments,the decoding system decodes the request packets and thus detects thatthe client computer communicates with the target server.

In some embodiments, the decoding system reconstructs the communicationsession between the client computer and the target server, as viewed bythe user of the client computer. The reconstructed session can bepresented to an operator. In an example embodiment, the decoding systempre-processes the packets so as to create an artificial session, whichis conducted directly between the client computer and the target serverand does not pass through the proxy server. The artificial session isthen decoded and presented to the operator.

The methods and systems described herein are highly effective inperforming surveillance on client computers that communicate via proxyservers, and in particular for decoding and reconstructing communicationsessions conducted via proxy servers. As such, the disclosed techniquesenhance the surveillance capabilities of Government and law enforcementagencies.

System Description

FIG. 1 is a block diagram that schematically illustrates a web decodingsystem 20, in accordance with an embodiment of the present disclosure.System 20 accepts communication packets from a communication network 24in order to apply surveillance operations to network users. In thepresent example, system 20 decodes, reconstructs and presentscommunication sessions that are conducted between client computers 28and target servers 32 via proxy servers 36. Alternatively, however, thedisclosed techniques can be used to perform any other suitablesurveillance operation. Systems of this sort may be operated, forexample, by law enforcement, homeland security or other Governmentagency. System 20 may connect to network 24 and receive the packetsusing any suitable interface, such as using passive probing or packetmirroring.

Network 24 may comprise, for example, the Internet, an intranet of acertain organization, or any other suitable network. Client computers 32may comprise, for example, personal, mobile or notebook computers,mobile communication terminals such as cellular phones or PersonalDigital Assistants (PDAs), or any other suitable computing platformhaving communication capabilities. Client computers 32 may connect tonetwork 24 using any suitable wireless or wire-line means.

System 20 comprises a network interface 40 for connecting to network 24,and a decoding processor 44 that carries out the methods describedherein. In a typical embodiment, processor 44 accepts communicationpackets from network 24 via interface 40. Based on the received packets,the decoding processor identifies that a certain client computer 28conducts a communication session with a certain target server 32 via aproxy server 36. By processing the packets, the decoding processor thencorrelates the identity of the client computer with the identity of thetarget computer, in spite of the fact that communication is carried outvia the proxy server. Several examples of techniques that correlate theidentities of client computers and target servers are described ingreater detail below. In an embodiment, decoding processor 44reconstructs the communication session, i.e., attempts to recreate theuser interface that is viewed by the user of the client computer. Thereconstructed session is presented to an operator of system 20.

The system configuration shown in FIG. 1 is chosen purely for the sakeof conceptual clarity. In alternative embodiments, any other suitablesystem configuration can also be used. Although FIG. 1 shows only asingle client computer, a single target server and a single proxy serverfor the sake of clarity, real-life networks often comprise multipleclient computers, target servers and proxy servers. Typically, system 20processes packets pertaining to multiple communication sessionssimultaneously.

The elements of system 20 can be implemented in hardware, such as usingone or more Application-Specific Integrated Circuits (ASICs) or FieldProgrammable Gate Arrays (FPGAs). Alternatively, some elements of system20 may be implemented in software, or using a combination of hardwareand software elements. In some embodiments, decoding processor 44comprises a general-purpose computer, which is programmed in software tocarry out the functions described herein. The software may be downloadedto the computer in electronic form, over a network, for example, or itmay, alternatively or additionally, be provided and/or stored onnon-transitory tangible media, such as magnetic, optical, or electronicmemory.

Reconstructing Communication Sessions Conducted Via Proxy Servers

Consider a certain client computer 28 that conducts a communicationsession with a certain target server 32 via a certain proxy server 36.The client computer may choose to communicate via the proxy server forany reason, for example in order to hide his or her identity. As anotherexample, the client computer's network may have restricted access to thetarget server, and the client computer uses the proxy server to workaround this restriction.

In some embodiments, target server 32 comprises a Web server and proxyserver 36 comprises a Hyper-Text Transfer Protocol (HTTP) server, alsoreferred to as a Web-proxy server. In these embodiments, client computer28 typically operates a browser application that attempts to access Webpages on target server 32. Instead of communicating with the targetserver directly, the browser of the client computer communicates withproxy server 36. The client computer's browser sends to the proxy serverHTTP request messages that notify the proxy server of the requested Webpages. The proxy server communicates with the target server and relaysthe requested Web pages to the client computer.

In alternative embodiments, proxy server 36 comprises a SOCKS proxyserver, which operates in accordance with the SOCKS protocol. The SOCKSprotocol is described, for example, by Leech et al., in Request ForComments (RFC) 1928 of the Internet Engineering Task Force (IETF),entitled “SOCKS Protocol Version 5,” March, 1996, which is incorporatedherein by reference. In a SOCKS proxy server, the client computerprovides the proxy server with the details of a connection that is to beset up with a target server. The proxy server then establishes twoback-to-back connections—one with the client computer and the other withthe target server.

Regardless of the type of proxy server, the communication sessiontypically involves establishing two back-to-back connections: Aclient-proxy connection 37 between the client computer and the proxyserver, and a proxy-target connection 38 between the proxy server andthe target server. The specific protocols used over these connectionsmay vary from one type of proxy server to another. Connections 37 and 38may comprise, for example, Transfer Control Protocol (TCP) streams, IPtunnels or any other suitable connection type.

Typically, any packet that is exchanged over connection 37 or 38 mayindicate the identity of the client computer or of the target server,but not both, at least not in explicit form. The correlation between thetwo identities is known to the proxy server, but not to the targetserver. In some embodiments, decoding processor 44 processes the packetsthat are exchanged over at least one of connection 37 and connection 38,in order to correlate the identity of the client computer with theidentity of the target server.

In some embodiments, when proxy server 36 comprises an HTTP proxyserver, the HTTP request packets that are sent over connection 37 fromclient computer 28 to proxy server 36 contain the URL of the targetserver in encoded (obfuscated) form. In an embodiment, decodingprocessor 44 decodes at least one of the HTTP request packets, so as toextract the URL of the target server. The decoding processor thencorrelates the extracted URL of the target server with the identity(e.g., IP address) of the client computer that is indicated in the HTTPrequest packets.

FIG. 2 is a flow chart that schematically illustrates a method fordecoding network traffic that is exchanged over proxy server 36, inaccordance with an embodiment of the present disclosure. The methodbegins with decoding system 20 receiving (e.g., intercepting) Webtraffic from network 24, at an input step 50. The Web traffic typicallycomprises multiple communication packets that are exchanged over thenetwork.

Decoding processor 44 identifies some of the packets as belonging to acommunication session that is conducted between client computer 28 andtarget server 32 over proxy server 36. The identified packets may belongto client-proxy connection 37, to proxy-target connection 38, or both.Decoding processor 44 extracts and correlates the identity of the clientcomputer and the identity of the target server by processing theidentified packets, at a correlation step 58.

Using the correlated identities of the client computer and the targetserver, processor 44 reconstructs the communication session, as seen bythe user of the client computer, at a session decoding step 62. Thereconstructed session is presented to an operator of system 20, e.g., ona display or other suitable output device.

Pre-Processing of Proxy Sessions

In some embodiments, processor 44 decodes and reconstructs sessions overproxy servers in a two-stage process. In the first stage, processor 44pre-processes the packets that belong to sessions that are conductedover proxy servers. The pre-processing operation modifies the packets torepresent an artificial direct session between the client computer andthe target server, which does not pass through the proxy server. In thesecond stage, processor 44 decodes the modified packets, so as toreconstruct the artificial direct session, and presents thereconstructed session to the operator.

The disclosed technique is useful, for example, when decoding system 20comprises existing software (or hardware) for decoding, reconstructingand presenting direct sessions that do not use proxy servers. Thetwo-stage process described above enables adding the capability ofdecoding proxy sessions to such a decoding system.

In some embodiments, the pre-processing operation involves classifyingthe incoming packets to packets that belong to proxy sessions andpackets that belong to non-proxy sessions. Packets belonging tonon-proxy sessions are decoded as-is and the sessions presented to theoperator. Packets belonging to proxy sessions are pre-processed asexplained above, and then looped back to the classifying operation.Since the pre-processing modifies the packets to appear as a directsession, the looped-back packets will be classified as belonging tonon-proxy sessions and provided for decoding. The two-stageconfiguration described above is shown purely by way of example. Inalternative embodiments, any other suitable configuration can be used.

It will be appreciated that the embodiments described above are cited byway of example, and that the present disclosure is not limited to whathas been particularly shown and described hereinabove. Rather, the scopeof the present disclosure includes both combinations andsub-combinations of the various features described hereinabove, as wellas variations and modifications thereof which would occur to personsskilled in the art upon reading the foregoing description and which arenot disclosed in the prior art.

1. A method, comprising: intercepting, by a decoding system, a packetsent between a client computer and a proxy server or between the proxyserver and a target server, the packet comprising a first identity ofone of the client computer or the target computer in a header of thepacket; decoding, by the decoding system, the packet to extract a secondidentity of the other one of the client computer or the target serverfrom a payload of the packet; correlating the first identity with thesecond identity; reconstructing and presenting a communication sessionbetween the client computer and the target server, as viewed by a userof the client computer, using the correlated first and secondidentities, the communication session includes the packet.
 2. The methodof claim 1, wherein the first identity is an Internet Protocol (IP)address of the one of the client computer or the target computer.
 3. Themethod of claim 2, wherein the second identity is a Uniform ResourceLocator (URL) of the other one of the client computer or the targetserver.
 4. The method of claim 1, wherein the packet includes an encodedform of the second identity.
 5. The method of claim 4, wherein thepacket is an Hyper-Text Transfer Protocol (HTTP) request packet.
 6. Themethod of claim 1, wherein reconstructing and presenting thecommunication session comprises modifying the packet to imitate amodified session between the client computer and the target server thatdoes not traverse the proxy server.
 7. The method of claim 6, whereinreconstructing and presenting the communication session furthercomprises: reconstructing the modified session from modifiedcommunication packets including the modified packet; and decoding andpresenting the modified communication session.
 8. The method accordingto claim 1, wherein the client computer communicates with the proxyserver over a first Transmission Control Protocol (TCP) tunnel, whereinthe proxy server communicates with the target server over a second TCPtunnel, and wherein correlating the first and second identitiescomprises decoding at least the first TCP tunnel.
 9. The methodaccording to claim 1, wherein the proxy server comprises a HTTP proxyserver.
 10. The method according to claim 1, wherein the proxy serveroperates in accordance with a SOCKS protocol.
 11. An apparatus,comprising: a network interface, which is configured to intercept acommunication packet sent between a client computer and a proxy serveror between the proxy server and a target server, the packet comprising afirst identity of one of the client computer or the target computer in aheader of the packet; and a processor, which is configured to decode thepacket to extract a second identity of the other one of the clientcomputer or the target server; wherein the processor is furtherconfigured to correlate the first identity with the second identity;wherein the processor is further configured to reconstruct and present acommunication session between the client computer and the target server,as viewed by a user of the client computer, using the correlated firstand second identities, the communication session includes the packet.12. The apparatus of claim 11, wherein the first identity is an InternetProtocol (IP) address of the one of the client computer or the targetcomputer.
 13. The apparatus of claim 12, wherein the second identity isa Uniform Resource Locator (URL) of the other one of the client computeror the target server.
 14. The apparatus of claim 11, wherein the packetincludes an encoded form of the second identity.
 15. The apparatus ofclaim 14, wherein the packet is an Hyper-Text Transfer Protocol (HTTP)request packet.
 16. The apparatus of claim 11, wherein the processor isfurther configured to modify the packet to imitate a modified sessionbetween the client computer and the target server that does not traversethe proxy server.
 17. The apparatus of claim 16, wherein the processoris further configured to: reconstruct the modified session from modifiedcommunication packets including the modified packet; and decode andpresenting the modified communication session.
 18. The apparatus ofclaim 11, wherein the client computer communicates with the proxy serverover a first Transmission Control Protocol (TCP) tunnel, wherein theproxy server communicates with the target server over a second TCPtunnel, and wherein the processor is further configured to decode atleast the first TCP tunnel to correlate the first identity with thesecond identity.
 19. The apparatus of claim 11, wherein the proxy servercomprises a HTTP proxy server.
 20. The apparatus of claim 11, whereinthe proxy server operates in accordance with a SOCKS protocol.